Search CORE

82 research outputs found

Tight Lower Bounds for Greedy Routing in Higher-Dimensional Small-World Grids

Author: Dietzfelbinger Martin
Woelfel Philipp
Publication venue
Publication date: 06/05/2013
Field of study

We consider Kleinberg's celebrated small world graph model (Kleinberg, 2000), in which a D-dimensional grid {0,...,n-1}^D is augmented with a constant number of additional unidirectional edges leaving each node. These long range edges are determined at random according to a probability distribution (the augmenting distribution), which is the same for each node. Kleinberg suggested using the inverse D-th power distribution, in which node v is the long range contact of node u with a probability proportional to ||u-v||^(-D). He showed that such an augmenting distribution allows to route a message efficiently in the resulting random graph: The greedy algorithm, where in each intermediate node the message travels over a link that brings the message closest to the target w.r.t. the Manhattan distance, finds a path of expected length O(log^2 n) between any two nodes. In this paper we prove that greedy routing does not perform asymptotically better for any uniform and isotropic augmenting distribution, i.e., the probability that node u has a particular long range contact v is independent of the labels of u and v and only a function of ||u-v||. In order to obtain the result, we introduce a novel proof technique: We define a budget game, in which a token travels over a game board, while the player manages a "probability budget". In each round, the player bets part of her remaining probability budget on step sizes. A step size is chosen at random according to a probability distribution of the player's bet. The token then makes progress as determined by the chosen step size, while some of the player's bet is removed from her probability budget. We prove a tight lower bound for such a budget game, and then obtain a lower bound for greedy routing in the D-dimensional grid by a reduction

arXiv.org e-Print Archive

Crossref

Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

Author: Dietzfelbinger Martin
Walzer Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over F_2 = {0,1} with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions n(1-epsilon) x n is generated as follows: In each row, identify a block of length L = O((log n)/epsilon) at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area. They readily extend to larger finite fields in place of F_2. By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function f: S -> {0,1} for some set S of m = (1-epsilon)n keys. It requires m/(1-epsilon) bits of space, construction takes O(m/epsilon^2) expected time on a word RAM, while queries take O(1/epsilon) time and access only one contiguous segment of O((log m)/epsilon) bits in the representation (O(1/epsilon) consecutive words on a word RAM). The method is readily implemented and highly practical, and it is competitive with state-of-the-art methods. In a more theoretical variant, which works only for unrealistically large S, we can even achieve construction time O(m/epsilon) and query time O(1), accessing O(1) contiguous memory words for a query. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

A More Reliable Greedy Heuristic for Maximum Matchings in Sparse Random Graphs

Author: Dietzfelbinger Martin
Peilke Hendrik
Rink Michael
Publication venue
Publication date: 19/03/2012
Field of study

We propose a new greedy algorithm for the maximum cardinality matching problem. We give experimental evidence that this algorithm is likely to find a maximum matching in random graphs with constant expected degree c>0, independent of the value of c. This is contrary to the behavior of commonly used greedy matching heuristics which are known to have some range of c where they probably fail to compute a maximum matching

arXiv.org e-Print Archive

CiteSeerX

On randomness in Hash functions

Author: Dietzfelbinger Martin
Publication venue
Publication date: 01/01/2012
Field of study

In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

Succinct Data Structures for Retrieval and Approximate Membership

Author: Dietzfelbinger Martin
Pagh Rasmus
Publication venue
Publication date: 01/01/2008
Field of study

The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM

arXiv.org e-Print Archive

CiteSeerX

The IT University of Copenhagen's Repository

Towards Optimal Degree-distributions for Left-perfect Matchings in Random Bipartite Graphs

Author: Fakultät Für Informatik Und Automatisierung
Martin Dietzfelbinger
Michael Rink
Publication venue
Publication date: 27/04/2012
Field of study

Consider a random bipartite multigraph

G

with

n

left nodes and

m \geq n \geq 2

right nodes. Each left node

x

has

d_x \geq 1

random right neighbors. The average left degree

\Delta

is fixed,

\Delta \geq 2

. We ask whether for the probability that

G

has a left-perfect matching it is advantageous not to fix

d_x

for each left node

x

but rather choose it at random according to some (cleverly chosen) distribution. We show the following, provided that the degrees of the left nodes are independent: If

\Delta

is an integer then it is optimal to use a fixed degree of

\Delta

for all left nodes. If

\Delta

is non-integral then an optimal degree-distribution has the property that each left node

x

has two possible degrees, \floor{\Delta} and \ceil{\Delta}, with probability

p_x

and

1-p_x

, respectively, where

p_x

is from the closed interval

[0,1]

and the average over all

p_x

equals \ceil{\Delta}-\Delta. Furthermore, if

n=c\cdot m

and

\Delta>2

is constant, then each distribution of the left degrees that meets the conditions above determines the same threshold

c^*(\Delta)

that has the following property as

n

goes to infinity: If

c<c^*(\Delta)

then there exists a left-perfect matching with high probability. If

c>c^*(\Delta)

then there exists no left-perfect matching with high probability. The threshold

c^*(\Delta)

is the same as the known threshold for offline

k

-ary cuckoo hashing for integral or non-integral

k=\Delta

arXiv.org e-Print Archive

CiteSeerX

Dense peelable random uniform hypergraphs

Author: Dietzfelbinger Martin
Walzer Stefan
Publication venue
Publication date: 01/01/2019
Field of study

We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ~ 0.918, f_4 ~ 0.977, f_5 ~ 0.992, ...) are well beyond the corresponding thresholds (c_3 ~ 0.818, c_4 ~ 0.772, c_5 ~ 0.702, ...) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^Z and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

How Good Is Multi-Pivot Quicksort?

Author: Aumüller Martin
Dietzfelbinger Martin
Klaue Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/05/2016
Field of study

Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step

k

pivots are used to split the input into

k + 1

segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-

k

approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 to 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost. The study is theoretical in the sense that although the findings motivate design recommendations for multipivot quicksort algorithms that lead to running time improvements over known algorithms in an experimental setting, these improvements are small.Comment: Submitted to a journal, v2: Fixed statement of Gibb's inequality, v3: Revised version, especially improving on the experiments in Section

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository